

average click rank (vs 23 in existing tools)
searches across all datasets
faster from search to analysis
Role
Sole Product Designer
End-to-end design, user research, search relevance optimization
Team
2 Engineers, 1 Product Ops, 1 PM
Timeline
Feb - Jun 2024



Context
Accuracy is non-negotiable in legal and policy work. Yet with existing tools, relevant results only appeared at click rank 23, forcing officers to spend days trawling through noise before manually synthesising findings. A multi-day process that slows down decision-making.


Research
I spoke with 8 policy officers and 5 lawyers to understand their professional search behaviours and pain points with current tools.
Keyword search buries the right answer
Existing tools are using traditional keyword search. Relevant results are mixed unpredictably with noise, so officers waste time sieving through results before finding what they need.

One question means three databases
Hansard, Legislation, Judgments: each searched separately, each with different interfaces. This multiplies effort and creates information gaps.
Synthesis is the real time sink
After finding results, more time is spent extracting quotes, building timelines, categorising by legal principle.
Black-box AI outputs need to be fact-checked
Officers tried other generative AI tools, but opaque summaries meant fact-checking the AI outputs instead of doing analysis. The lack of trust in the outputs meant that they gave up using them entirely, or led to more work.
Search Relevance
A prettier UI wouldn't help if results were still buried. I worked closely with our engineer and Product Ops to improve search quality directly.

Test case development
Created evaluation datasets across all 3 sources for relevance testing.
Algorithm optimization
Engineers built a hybrid retrieval system combining semantic search (e5 embeddings), keyword search (BM25), and ColBERTv2 reranking. I helped tune the weights between these scoring methods to optimise for relevance.
Result
3.2 average click rank. Users found what they needed in the top 3 results, a 19-position improvement over existing platforms.
Interface Design
Excerpts and metadata for quick relevance scanning
OOfficers needed to assess results without clicking into every one.
One-click analysis tools for common research tasks
Created synthesis tools for each source: Hansard (Speaker view, Policy timeline), Judgments (Legal principles, Case summary), Legislation (Key areas, Summary). I wrote and stress-tested every prompt. Officers select the most relevant results, then run analysis, keeping them in the loop (search, select, analyze) rather than AI generating without transparency.

Suggested filters reduce refinement clicks
Ranking couldn't be perfect for every query, so the system suggests filters based on the query. A thorough filter panel handles deeper research needs.

One interface across Hansard, Judgments, and Legislation
Unified search with embedded results from other sources. One question, one search, no more switching between separate tools.
Domain expertise on the team is non-negotiable.
We brought on a legally trained intern and worked with a Courts officer to review the generated analysis. Without them, we couldn't assess search or analysis quality and would miss nuances entirely.
Solve the core problem first.
Fancy AI features were tempting to build, but users wanted to find landmark cases quickly. Improving search accuracy was more important than flashy generations, which was where we spent most of our time.
AI should complement expert mental models, not replace them.
Transparent, controllable tools that support search → verify → synthesise, not opaque summaries officers have to fact-check. Putting users in control of what goes into the generation process is almost as important than the output itself.
Growing up, I learnt that a warm bowl of noodles and cut fruits is a language of care. So here's some virtual care from me to you, and thank you for stopping by!
